Skip to content

Record: Order-Adaptive Entropy Gating + BackoffNgramMixer (val_bpb=0.5466)#798

Open
travispchen wants to merge 1 commit intoopenai:mainfrom
travispchen:oaeg-backoff-ngram
Open

Record: Order-Adaptive Entropy Gating + BackoffNgramMixer (val_bpb=0.5466)#798
travispchen wants to merge 1 commit intoopenai:mainfrom
travispchen:oaeg-backoff-ngram

Conversation

@travispchen
Copy link

Order-Adaptive Entropy Gating + BackoffNgramMixer + Drift-Free TTT

val_bpb: 0.5466 (3-seed mean, std 0.0010) | ~15.99 MB | 8×H100 SXM

Adds order-adaptive entropy gating on top of PR #779's BackoffNgramMixer + Drift-Free TTT submission. Instead of using a single entropy center for all n-gram orders, each order gets its own threshold — higher orders are trusted at lower entropy, lower orders only kick in when the model is more uncertain.

Results (8×H100 80GB SXM, PyTorch 2.9.1+cu128)

Seed step_avg steps Pre-TTT bpb Post-TTT bpb TTT gain TTT time Artifact
1337 99.3ms 5,863 1.1279 0.5478 -0.5801 607s 15,995,959
42 98.3ms 5,863 1.1362 0.5458 -0.5904 606s 15,979,251
2025 99.2ms 5,869 1.1369 0.5463 -0.5906 607s 15,994,227
Mean 98.9ms 5,865 1.1337 0.5466 (std 0.0010) -0.5871 ~607s

What Changed vs PR #779

PR #779 uses a single entropy_center=3.5 for all n-gram orders. We replace this with per-order entropy centers:

# PR #779 (single entropy center for all orders)
alpha = alpha_min + (alpha_max - alpha_min) * sigmoid(2.0 * (entropy - 3.5))

# This submission (per-order entropy centers)
ent_centers = {7: 3.0, 6: 3.2, 5: 3.5, 4: 3.8, 3: 4.2, 2: 4.5}
ent_center = ent_centers[matched_order]
alpha = alpha_min + (alpha_max - alpha_min) * sigmoid(2.0 * (entropy - ent_center))

Higher-order n-grams (7, 6, 5) are trusted at lower model entropy — when the model is fairly confident, the precise n-gram correction refines the prediction. Lower-order n-grams (4, 3, 2) only intervene at higher entropy — when the model is confused enough that even coarse statistics help.

This is an eval-time-only change. It modifies how existing n-gram statistics are combined with neural predictions, not when data enters the cache. The n-gram cache is still updated strictly AFTER scoring each batch (score-first).

Legality

  • Score-first: N-gram cache updated AFTER scoring each batch. No future tokens leak into predictions.
  • No oracle selection: Alpha depends only on model entropy and n-gram order, not on ground truth.
  • Artifact size: All seeds strictly under 16,000,000 bytes (max: 15,995,959).
  • Training time: Capped at 600s (10 min) on 8×H100 (actual: ~582s).
  • Eval time: TTT eval ≤607s on 8×H100.

Ablation

Change Post-TTT bpb Delta
PR #779 baseline (single entropy center) 0.6713
+ Order-adaptive entropy gating 0.5478 -0.1235

Credits

  • BackoffNgramMixer + Drift-Free TTT + Base model: PR #779
  • Order-adaptive entropy gating: This submission

…5466, 3-seed mean)

Adds order-adaptive entropy gating on top of PR openai#779's BackoffNgramMixer + Drift-Free TTT.
Per-order entropy centers replace single threshold: higher n-gram orders trusted at lower entropy.
3-seed validation: 0.5478, 0.5458, 0.5463 (mean 0.5466, std 0.0010).
All artifacts strictly under 16,000,000 bytes.

Co-Authored-By: Travis Chen <travispchen@gmail.com>
newjordan pushed a commit to newjordan/parameter-golf-1 that referenced this pull request Mar 26, 2026
Add per-order entropy centers from PR openai#798 insight:
  order 7: center=3.0, order 6: 3.2, order 5: 3.5,
  order 4: 3.8, order 3: 4.2, order 2: 4.5
Higher orders trusted at lower entropy, lower orders only at high
uncertainty. Cubric multipliers applied on top.

Original X-WING (0.5644) untouched in concepts/xwing/.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
newjordan pushed a commit to newjordan/parameter-golf-1 that referenced this pull request Mar 26, 2026
PR openai#798's approach on our engine: per-order entropy centers
(7:3.0, 6:3.2, 5:3.5, 4:3.8, 3:4.2, 2:4.5) without cubric.
Testing if cubric was hurting when combined with per-order gating.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant